perm filename PASTE[L70,TES] blob
sn#017425 filedate 1972-12-14 generic text, type T, neo UTF8
COMMENT ⊗ VALID 00010 PAGES
RECORD PAGE DESCRIPTION
00001 00001
00002 00002 CHAPTERS AND PARTS OF CHAPTERS FOR PAPERS AND MANUALS
00003 00003 OUTLINE: TUTORIAL MANUAL-ONLY
00005 00004 P-INT INTRODUCTION
00007 00005 P-RR REWRITE RULES
00009 00006 P-LIT LITERALS
00012 00007 P-RF REWRITE FUNCTIONS
00014 00008 P-PRIV PRIVATE VARIABLES
00018 00009 P-EL ELLIPSIS
00019 00010 P-PREC PRECEDENCE OF RULES
00022 ENDMK
⊗;
CHAPTERS AND PARTS OF CHAPTERS FOR PAPERS AND MANUALS
TUTORIAL ARTICLE ON "PATTERN DIRECTED STREAM PROCESSING"
STRESS USES, IMPLEMENTATION, HISTORY
CF. FORMAT DECLARATIONS, RPG, DECISION TABLES
PROBABLY OMIT CAUSATION, MEXPRS, PUBLIC VARIABLES, OR PUT IN AFTERNOTE
A.I. CONFERENCE
ASSUME KNOWLEDGE OF LISP, PATTERN MATCHING, AND BACKTRACKING
STRESS ORGANIZATION OF DATA BASE INTO PARTIAL FUNCTIONS;
IMPLICIT INVERSES; STREAMING; PRECEDENCE; EXTENSIBLE LANGUAGE
SIGPLAN
STRESS EXTENSIBILITY AND MACHINE INDEPENDENCE
OUTLINE: TUTORIAL MANUAL-ONLY
P- PATTERN DIRECTED COMPUTATION
INT INTRODUCTION
RR REWRITE RULES
LIT LITERALS ` ' ≡ ( )
RF REWRITE FUNCTIONS
PRIV PRIVATE VARIABLES :
EL ELLIPSIS ..
PREC PRECEDENCE OF RULES ≤≥
LISP LISP SUBSET
EVAL EVALUATION #
APPL APPLICATION OMIT @ {} *
SEG SEGMENTS :: ## ... @@ ** $
PRED PREDICATION ?
CONV CONVERSION <>
M- MLISP INTERFACE OMIT
MEXPR MEXPRS IN PEXPRS [] /
PEXPR PEXPRS IN MEXPRS {}
PUBL PUBLIC VARIABLES IN P % %% %# %() %[] %#() %#[]
E- EXTENSION
EXT EXTENDING REWRITE FNS
USES USES FOR EXTENSIBLE FUNCTIONS
C- CAUSATION OMIT
SIDE SIDE EFFECTS &
DEMON INSTIGATING CAUSES &
I- IMPLEMENTATION
LR TOP-DOWN LEFT-RIGHT
BACK BACKTRACKING; LEFT-RECURSION; →→
FACT FACTORING OF RULES
ALTER EXTENDING
SD SOURCE AND DESTINATION
PEEK LOOK AHEAD TECHNIQUES
SPEC SPECIAL CASE DETECTION
PARA PARALLEL COMPUTERS
P-INT INTRODUCTION
↓_Pattern-directed computation_↓ involves two kinds of operations on
data structures: ∪decomposition and ∪recomposition. Decomposition
breaks down an ↓_input stream_↓ into components under the direction
of a ↓_decomposing pattern_↓ ("dec"). The inverse operation,
recomposition, constructs an ↓_output stream_↓ under the direction of
a ↓_recomposing pattern_↓ ("rec").
The distinguishing characteristic of pattern-directed computation is
that it appears to be directed by a pattern rather than by an
algorithm, i.e., by a picture rather than by instructions. Of course,
to implement it on a conventional computer, the pattern must be
translated to or interpreted by an algorithm.
Decomposition is sometimes called "pattern-matching", and its
interpreters are called "pattern-matchers". See [MAD DOCTOR, ELIZA,
SNOBOL, COMIT, FLIP, PLANNER, QA-4, FLEX, MLISP-2, ...].
The abilty to use pictures to guide decomposition facilitates the
programming of table-driven syntactic parsers and structured-data
information retrieval systems. Pattern-driven recomposition simplifies
the specification of code generators, output formats, and
structured-data transformations.
P-RR REWRITE RULES
A ↓_rewrite rule_↓ is of the form:
.I
dec → rec
It defines a partial function on streams as follows: if the input stream
matches the dec, then the output stream is generated by the rec.
The following rewrite rule is part of the "square" function:
.I
5 → 25
If the input stream consists solely of the token `5' (a number), then the output
stream consists solely of the token `25'.
The following rewrite rule
is part of the "square root" function:
.I
25 → -5 5
If the input stream consists solely of the token `25', then the output
stream consists of the two tokens `-5' and `5'.
The next rule could
be part of a question-answering function:
.I
`HOW ARE YOU?' → `VERY WELL AND YOU?'
If the input stream consists of the four tokens:
.I
HOW ARE YOU ?
then the output stream consists of the five tokens:
.I
VERY WELL AND YOU ?
P-LIT LITERALS
In the above examples, the input and output streams contained only
↓_atomic tokens_↓ such as numbers and words. Sometimes, a token is a
non-atomic (structured) entity such as a ↓_list_↓. A list will be represented
by a pair of parentheses enclosing its elements written in left-to-right
order. Thus, the following rewrite rule is part of the "transpose" function:
.I
((A B C) (D E F)) → ((A D) (B E) (C F))
It takes an input stream containing the single token ((A#B#C)#(D#E#F)) and
generates an output stream containing the single token ((A#D)#(B#E)#(C#F)).
The lists in this case are meant to represent a 2x3 and a 3x2 matrix.
Different pattern-matchers employ a variety of notations to express similar
concepts. In the notation of this paper, single quotes will be
used to surround atomic literals. Several
atomic literals may appear within a single
pair of quotes. Each literal contains either a
series of alphanumeric characters or a single non-alphanumeric character.
Some characters have
special meaning between single quotes:
.I
` ' : ≡
To use such a character as a literal, it must be preceded by
a ≡ to override its special meaning. Example:
.I
`I SAY≡: WHAT≡'S UP?' → `NOT MUCH.'
This rule matches an input stream containing the eight tokens:
.I
I SAY : WHAT ' S UP ?
and generates the output stream:
.I
NOT MUCH .
The problems of separating a character-oriented input stream into tokens
and of editing superfluous spaces out of a character-oriented output stream
will be discussed later. It is henceforth assumed that this has been done.
P-RF REWRITE FUNCTIONS
A rewrite rule defines a partial function, for example, the mapping of
some particular token into some other particular token. A broader
partial function can be defined as the union of several rewrite rules.
A ↓_rewrite function definition_↓ is of the form:
.B
<name> = dec1 → rec1 ;
= dec2 → rec2 ;
.
.
.
= decn → recn ;
.E
For example, a larger part of the "square function" than shown earlier, now
encompassing six cases, can be defined by:
.B
<square>= 0 → 0 ;
= 1 → 1 ;
= 2 → 4 ;
= 3 → 9 ;
= 4 → 16 ;
= 10 → 100 ;
.E
Thus, square(1)=1 and square(10)=100, but square(6) is undefined.
The rules comprising a rewrite function definition may appear in any order.
Note that a partial inverse function often can be derived by interchanging the dec
and rec in every rule. Thus, the above definition defines implicitly:
.I
<inverse square>= 0 → 0 ;
= 1 → 1 ;
= 4 → 2 ;
= 9 → 3 ;
= 16 → 4 ;
= 100 → 10 ;
.E
P-PRIV PRIVATE VARIABLES
A function is difficult to define if every case must be enumerated.
Therefore, pattern matchers allow ↓_variables_↓ to appear in patterns.
The value of a variable can be either a list or an atomic token.
In this paper, the notation:
.I
:X
where X is any identifier, will denote the ↓_private variable named X_↓.
The private variables of each rule are distinct from the private variables
of all other rules, even if their names are the same; hence the adjective
"private".
The following definition has only three rewrite rules but handles an
unlimited number of input streams:
.B
<reply> = `HOW ARE YOU?' → `VERY WELL, AND YOU?' ;
= `HOW IS :X?' → `I HAVE NOT SEEN :X.' ;
= `DID :X GO TO :Y?' → `WHY DON≡'T YOU ASK :X YOURSELF?' ;
.E
For example, reply(DID#JOHN#GO#TO#SCHOOL?) = WHY#DON'T#YOU#ASK#JOHN#YOURSELF?
and reply(DID#SAM#GO#TO#WORK?) = WHY#DON'T#YOU#ASK#SAM#YOURSELF?.
There are also an infinite number of input streams not handled by the above
definition. For example, reply(SEE#YOU#LATER.) is undefined. Thus, this
version of "reply" is still a partial function. In fact, most rewrite
function definitions do define partial functions. One exception is:
.I
<identity> = :Y → :Y ;
A private variable can appear more than once in a single dec pattern. It
must match identical tokens at each appearance. Example:
.I
<answer>= `WILL :X OR WON≡'T :X?' → `I DON≡'T KNOW.' ;
Thus, answer(WILL#ALICE#OR#WON'T#ALICE?) = I#DON'T#KNOW., but
answer(WILL#JANE#BUT#WON'T#FRED?) is undefined.
Systematic renaming of the private variables or any rewrite rule does not
change the meaning of the rule. The following two rules are equivalent:
.B
:X (:X :Z) → (:Z :Z)
:YA (:YA :R) → (:R :R)
.E
If the input stream contains the two tokens:
.I
7 (7 (HELLO JOE))
then the atom 7 will be bound to X (or to YA) and the list (HELLO#JOE) will
be bound to Z (or to R). The output stream from either rule will be:
.I
((HELLO JOE) (HELLO JOE))
P-EL ELLIPSIS
To make patterns easier to read and write, the double-period ellipsis symbol
can be used to symbolize an unnamed private variable. Thus, the following
two rules are equivalent:
.B
`IS :X COMING?' → `NO, :X COULD NOT MAKE IT.'
`IS .. COMING?' → `NO, .. COULD NOT MAKE IT.'
.E
If an ellipsis occurs several times in a pattern, it designates a
different variable each time. The n'th double-period in the dec
designates the same variable as the n'th double-period in the rec.
The following two rules are equivalent:
.B
:X (:Y :Z) → (:X :Y)
.. (.. ..) → (.. ..)
.E
P-PREC PRECEDENCE OF RULES
It is possible that a given input stream will match the decs of more than
one rule of a rewrite function. For the time being, it will be assumed that
one and only one of those rules that match will be applied. The decision as
to which one applies will be based on criteria outlined
below. This assumption is essentially ↓_deterministic_↓. If the pattern-matcher
were embedded in a non-deterministic system (implemented by backtracking or by
multiple processes), then more than one rule could be applied. Such a situation
will be discussed later in this paper.
In many pattern matchers, the rules are ordered by the programmer, and the
first rule that matches is applied. But in the pattern matcher of this paper,
the programmer may write rules in any order. This makes it easier for him
to add a new rule to a rewrite function, because it is usually unnecessary to
study the rules that are already there. However, it requires the pattern
matcher to utilize ↓_precedence criteria_↓ to impose an ordering on the rules.
The basic precedence criterion is that the rule with the most specific dec pattern
comes first. For example, if part of the "reply" function is:
.B
<reply> = `I SEE :X.' → `SO WHAT?' ;
= `I SEE SUE.' → `WHERE?' ;
.E
then the input stream `I SEE SUE.' matches the decs of both rules. However, the
second rule is more specific since it only applies to SUE whereas the first rule
applies to anyone. Therefore, reply(I#SEE#SUE.) = WHERE?#.